Search CORE

57 research outputs found

The Descriptive/Procedural Distinction is Flawed

Author: Renear Allen H.
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2001
Field of study

The traditional distinction between descriptive and procedural markup is flawed; it conflates two different dimensions — mood and domain — which in fact can vary independently. An adequate markup taxonomy must, among other things, incorporate distinctions such as those developed in contemporary “speech-act theory”. This will substantially complicate, although in interesting ways, the development of an adequate theory of markup semantics, as formalization will require modal operators and additional axiomatic relationships. In addition, these reflections reveal that there are foundational issues in markup theory that are not yet resolved, in particular the precise relationship between markup and text.Ope

Illinois Digital Environment for Access to Learning and Scholarship Repository

Defining Textual Entailment

Author: Jett Jacob
Korman Daniel Z.
Mack Eric
Renear Allen H.
Publication venue
Publication date: 01/01/2018
Field of study

Textual entailment is a relationship that obtains between fragments of text when one fragment in some sense implies the other fragment. The automation of textual entailment recognition supports a wide variety of text-based tasks, including information retrieval, information extraction, question answering, text summarization, and machine translation. Much ingenuity has been devoted to developing algorithms for identifying textual entailments, but relatively little to saying what textual entailment actually is. This article is a review of the logical and philosophical issues involved in providing an adequate definition of textual entailment. We show that many natural definitions of textual entailment are refuted by counterexamples, including the most widely cited definition of Dagan et al. We then articulate and defend the following revised definition: T textually entails H = df typically, a human reading T would be justified in inferring the proposition expressed by H from the proposition expressed by T. We also show that textual entailment is context-sensitive, nontransitive, and nonmonotonic

PhilPapers

eScholarship - University of California

Sustaining Collection Value: Managing Collection/Item Metadata Relationships

Author: Dubin David
Palmer Carole L.
Renear Allen H.
Urban Richard J.
Wickett Karen M.
Publication venue
Publication date: 01/06/2008
Field of study

Many aspects of managing collection/item metadata relationships are critical to sustaining collection value over time. Metadata at the collection-level not only provides context for finding, understanding, and using the items in the collection, but is often essential to the particular research and scholarly activities the collection is designed to support. Contemporary retrieval systems, which search across collections, usually ignore collection level metadata. Alternative approaches, informed by collection-level information, will require an understanding of the various kinds of relationships that can obtain between collection-level and item-level metadata. This paper outlines the problem and describes a project that is developing a logic-based framework for classifying collection-level/item-level metadata relationships. This framework will support (i) metadata specification developers defining metadata elements, (ii) metadata librarians describing objects, and (iii) system designers implementing systems that help users take advantage of collection-level metadata.Institute for Museum and Libary Services (Grant #LG06070020)published or submitted for publicationis peer reviewe

Illinois Digital Environment for Access to Learning and Scholarship Repository

Recommended from our members

Will Formal Preservation Models Require Relative Identity?

Author: Renear Allen H.
Sacchi Simone
Wickett Karen M.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2012
Field of study

The problem of identifying and re–identifying data put the notion of of ”same data” at the very heart of preservation, integration and interoperability, and many other fundamental data curation activities. However, it is also a profoundly challenging notion because the concept of data itself clearly lacks a precise and univocal definition. When science is conducted in small communicating groups, with homogeneous data these ambiguities seldom create problems and solutions can be negotiated in casual real-time conversations. However when the data is heterogeneous in encoding, content and management practices, these problems can produce costly inefficiencies and lost opportunities. We consider here the relative identity view which apparently provides the most natural interpretation of common identity statements about digitally–encoded data. We show how this view conflicts with the curatorial and management practice of “data” objects, in terms of their modeling, and common knowledge representation strategies

Columbia University Academic Commons

Recommended from our members

Definitions of Dataset in the Scientific and Technical Literature

Author: Renear Allen H.
Sacchi Simone
Wickett Karen M.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2010
Field of study

The integration of heterogeneous data in varying formats and from diverse communities requires an improved understanding of the concept of a dataset, and of key related concepts, such as format, encoding, and version. Ultimately, a normative formal framework of such concepts will be needed to support the effective curation, integration, and use of shared multi-disciplinary scientific data. To prepare for the development of this framework we reviewed the definitions of dataset found in technical documentation and the scientific literature. Four basic features can be identified as common to most definitions: grouping, content, relatedness, and purpose. In this summary of our results we describe each of these features, indicating the directions a more formal analysis might take

Columbia University Academic Commons

Recommended from our members

A Framework for Applying the Concept of Significant Properties to Datasets

Author: Dubin David
Renear Allen H.
Sacchi Simone
Wickett Karen M.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2011
Field of study

The concept of significant properties, properties that must be identified and preserved in any successful digital object preservation, is now common in data curation. Although this notion has clearly demonstrated its usefulness in cultural heritage domains its application to the preservation of scientific datasets is not as well developed. One obstacle to this application is that the familiar preservation models are not sufficiently explicit to identify the relevant entities, properties, and relationships involved in dataset preservation. We present a logic-based formal framework of dataset concepts that provides the levels of abstraction necessary to identify and correctly assign significant properties to their appropriate entities. A unique feature of this model is that it recognizes that a typed symbol structure is a unique requirement for datasets, but not for other information objects

Columbia University Academic Commons

Recommended from our members

One Thing is Missing or Two Things are Confused: An Analysis of OAIS Representation Information.

Author: Dubin David
Renear Allen H.
Sacchi Simone
Wickett Karen M.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2011
Field of study

We describe two alternative interpretations of OAIS Representation Information (CCSDS, 2002), and show that both are flawed. The first is insufficient to formalize a model of preservation, and the second leads to category mistakes in conceptualizing the nature of digital artifacts. This analysis is based on earlier work developing a framework for the application of significant properties to datasets (Sacchi et al, 2011)

Columbia University Academic Commons

Foundations of Data Curation: The Pedagogy and Practice of "Purposeful Work" with Research Data

Author: Muñoz Trevor
Palmer Carole
Renear Allen H.
Weber Nicholas M.
Publication venue: Archives Journal
Publication date: 01/01/2013
Field of study

Increased interest in large-scale, publicly accessible data collections has made data curation critical to the management, preservation, and improvement of research data in the social and natural sciences, as well as the humanities. This paper explicates an approach to data curation education that integrates traditional notions of curation with principles and expertise from library, archival, and computer science. We begin by tracing the emergence of data curation as both a concept and a field of practice related to, but distinct from, both digital curation and data stewardship. This historical account, while far from definitive, considers perspectives from both the sciences and the humanities. Alongside traditional LIS and archival science practices, unique aspects of curation have informed our concept of “purposeful work” with data and, in turn, our pedagogical approach to data curation for the sciences and the humanities.Ope

Illinois Digital Environment for Access to Learning and Scholarship Repository

A Vision for User-Defined Semantic Markup

Author: Iorio Angelo Di
Peroni Silvio
Renear Allen H.
Rice Stanley
Sperberg-McQueen C. M.
Sperberg-McQueen C. M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/09/2019
Field of study

Typesetting systems, such as LaTeX, permit users to define custom markup and corresponding formatting to simplify authoring, ensure the consistent presentation of domain-specific recurring elements and, potentially, enable further processing, such as the generation of an index of such elements. In XML-based and similar systems, the separation of content and form is also reflected in the processing pipeline: while document authors can define custom markup, they cannot define its semantics. This could be said to be intentional to ensure structural integrity of documents, but at the same time it limits the expressivity of markup. The latter is particularly true for so-called lightweight markup languages like Markdown, which only define very limited sets of generic elements. This vision paper sketches an approach for user-defined semantic markup that could permit authors to define the semantics of elements by formally describing the relations between its constituent parts and to other elements, and to define a formatting intent that would ensure that a default presentation is always available

Crossref

Serveur académique lausannois

Classificatory Theory in Data-Intensive Science: The Case of Open Biomedical Ontologies

publication-status: Publishedtypes: ArticleThis is the author's version of a paper that was subsequently published in International Studies in the Philosophy of Science. Please cite the published version by following the DOI link.Knowledge-making practices in biology are being strongly affected by the availability of data on an unprecedented scale, the insistence on systemic approaches and growing reliance on bioinformatics and digital infrastructures. What role does theory play within data-intensive science, and what does that tell us about scientific theories in general? To answer these questions, I focus on Open Biomedical Ontologies, digital classification tools that have become crucial to sharing results across research contexts in the biological and biomedical sciences, and argue that they constitute an example of classificatory theory. This form of theorizing emerges from classification practices in conjunction with experimental know-how and expresses the knowledge underpinning the analysis and interpretation of data disseminated online.Economic and Social Research Council (ESRC)The British AcademyLeverhulme Trus

PhilPapers

Crossref

Open Research Exeter